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Abstract 

The 'Signal plus Noise' model for nonparametric regression can be 
extended to the case of observations taken at the vertices of a graph. 
This model includes many familiar regression problems. This article 
discusses the use of the edges of a graph to measure roughness in 
penalized regression. Distance between estimate and observation is 
measured at every vertex in the L2 norm, and roughness is penalized 
on every edge in the Li norm. Thus the ideas of total-variation pe- 
nalization can be extended to a graph. The resulting minimization 
problem presents special computational challenges, so we describe a 
new, fast algorithm and demonstrate its use with examples. 

Further examples include a graphical approach that gives an im- 
proved estimate of the baseline in spectroscopic analysis, and a simula- 
tion applicable to discrete spatial variation. In our example, penalized 
regression outperforms kernel smoothing in terms of identifying local 
extreme values. In all examples we use fully automatic procedures for 
setting the smoothing parameters. 



1 Introduction 



There are a number of statistical models that contain some sort of graphical 
structure. Examples include image analysis, disease risk mapping and dis- 
crete spatial variation. We focus on those for which penalized regression is 
appropriate, and can be thought of in terms of the 'signal + noise' framework. 

We consider the regression of a continuous response variable on one or 
more explanatory covariates. Often there is some sort of graphical structure 
in and between the observations, or some obvious neighbouring scheme that 
gives rise to a graph. We think of the locations of the observations as the 
vertices of the graph. The edges may be suggested by the neighbouring 
scheme or by the covariate values. We will see some examples in this section. 

A model for data on the graph (V, S), which has vertices in the set V and 
edges in the set £, is 

Data = Signal + Noise 
Hi = ft + (TZi, i e V. 

The noise terms, usually assumed to be independent realizations of 

a random variable with zero mean and unit variance. Under this model 
regression on a graph involves estimating the underlying signal values /j, for 
all vertices i in the set V. We use the edges to measure the complexity of 
the estimate. 

Figure [T] shows an example of regression on a graph: a small, noisy image 
with 64 pixels. The responses are the grey levels of the pixels, so each pixel is 
a vertex of the graph. A natural choice of edges connects each pixel with its 
neighbours, resulting in the graph superimposed on the left-hand image in 
Figure [T} Regression on this graph involves estimating the underlying signal 
image, which is displayed in the right-hand image. 

In this article we discuss penalized regression on the graph (V, S). Penal- 
ized regression fits an estimate that is close to the data, but penalises rough 
or complicated estimates. With an observation at every vertex, we can mea- 
sure the distance between observed and estimated values by the sum of the 
distances at each vertex. The complexity of the estimate can be measured by 
the differences between the estimated values at adjacent observations. This 
measurement is therefore the sum of absolute differences at each edge. 

We discuss the penalized regression estimate that minimises 

Q{f)■■=lT.^^{f^-yif+ E Kjlfj'M 
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Figure 1: Example of a graphical structure present in a regression situation. 
The noisy image (left) shows a suitable graph for regression, based on the 4- 
neighbourhood. On the noiseless version (right) only the edges in the active 
set are shown. 



for given weights -iOj > 0, for i G V, and smoothing parameters Ajj > 0, for 
G S. This is the sum of a term that penalises distance from the data 
plus a term that penalises roughness. The first term is the distance from 
the data, measured at every vertex in the L2 norm. The second term is the 
weighted sum of roughness at every edge, measured in the Li norm. Our 
model allows for a different weight or smoothing parameter at each vertex 
and each edge. 

Although it is usual, in graph theory, to denote the edges by unordered 
pairs, we will treat £ as a set of ordered pairs for convenience of notation. 
This does not mean that (V, S) is a directed graph, since the ordering can be 
completely arbitrary. We do, however, consider there to be at most one edge 
that joins any pair of vertices. This is because it makes no sense to split the 
penalty between two vertices over more than one edge. 

1.1 Motivating examples 

As a first motivating example, we consider the problem of nonparametric re- 
gression between two continuous variables. Suppose we have response obser- 
vations ui, . . . ,yn taken at strictly ordered design points. There is a natural 
neighbouring structure: the first observation is adjacent to the second, the 
second is next to the third, and so on. Hence a natural graphical structure 
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for this example is given by {V2,S2), where 

V2 = {l,2,...,n} andf2 = {(l,2),(2,3),...,(n-l,n)}. 

The minimization of Q{f) provides an estimate of at every observation. 
If we let Wj = 1 for all i G V2 and use the convenient shorthand Aj = Xi^i+i, 
then Q{f) becomes 

-1 n n—1 

.E(l/«-/^)' + E^^I/m-/.| (1) 
^ i=i i=i 

and the roughness penalty is the weighted total variation of the estimate. 

Total variation can be extended to higher dimensions to tackle, for exam- 
ple, image analysis. An image can be thought of as an rii x n2 grid of pixels, 
with observations at each pixel. Then the set of vertices of the graph is the 
set of pixels 

Vi = {{ii,i2) : ii = I, . . . ,ni,i2 = I, . . . ,n2} . 

There are a number of neighbouring structures in use in image analysis. 
The simplest is the 4-neighbourhood (Winkler 2003, p. 57) in which a pixel 
has neighbours immediately above, below, to the left and to the right. This 
neighbouring scheme suggests the set of edges 

Sa = {{{11,12), {ii,i2 + l)) eVl}u {{{ti,i2),{ii + l, 12)) ev!} . 

Figure [T] shows a picture of this graph. 

Using the graph (V4,£^4), we can find a denoised image by minimising 
Q{f)- Now the roughness penalty is a measure of the total variation in the 
horizontal direction plus the total variation in the vertical direction. 

1.2 Review of existing methods 

Mammen and van de Geer (1997) first discussed the estimator obtained by 
minimising ([T]) where A is a global smoothing parameter. Some authors have 
allowed the smoothing parameters to differ. For example Davies and Kovac 
(2001) alter them during their local squeezing procedure. There are fast 
algorithms that find the solution to this specific minimization problem, in 
particular the taut string algorithm of Davies and Kovac (2001), which has 
0{n) computational complexity. 
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The estimator that minimises ([T|, in which error is measured in the L2 
norm and roughness in the Li norm, is a nonparametric version of the least 
absolute shrinkage (Lasso) estimator (Tibshirani 1996). Therefore the esti- 
mator that minimises Q{f) can be seen as a generalization of the nonpara- 
metric Lasso to any graph. There are other methods of penalized regression, 
with different roughness measures, that have been applied to observations 
on a graph. Belkin et al. (2004) describe an algorithm for Tikhonov reg- 
ularization. Their algorithm measures roughness at every edge in the L2 
norm. 

Koenker and Mizera (2004) employ a penalty term for triograms. Given 
irregularly-spaced observations, they create a graph by computing a Delau- 
nay triangulation of the observations. Their penalty term is also a weighted 
sum over all edges of the triangulation. However they measure roughness 
as the squared (L2) differences between gradients. Jansen et al. (2009) have 
discussed wavelet lifting as a method for regression on a graph. Like Koenker 
and Mizera, the authors use a Delaunay triangulation. 

Our algorithm is based on ideas similar to active set methods, which 
features in a number of algorithms, including that of Goldfarb and Idnani 
(1983). 

2 Optimization Algorithm 

In Theorem [1] below we give a sufficient condition for / to minimize Q{f) and 
in Subsection |2.2| we present a fast algorithm for finding such a minimizer. 
The minimum exists because Q{f), as a sum of convex functions, is convex 
itself. Therefore any local minimum of Q{f) will be a global minimum, 
and the set of all global minima will be a convex set. In the important case 
where all the weights Wi are strictly positive a unique global minimum exists, 
because Q{f) is strictly convex. 

2.1 Sufficient condition for minimization 

The solution to the minimization problem is characterized by regions of con- 
stant value, that is, sets of neighbouring vertices that share the same value 
of /. We define such regions by use of a special active set of edges, indexed 
by A. This consists of edges (i, j) G £ for which /j = fj, such that the graph 
{V,A) is acyclic. Note that, unlike the definition of active set used in many 
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optimization algorithms, there can still be edges (i, j) ^ A such that /, = fj. 

We will denote by TZ{k) the entire region of constant value that contains 
the vertex k. More formally let 

7l{k) = {i G V : i is connected to k in (V, A)} . 

We will also denote by A{k) that subset of the active set that holds the region 
7l{k) together, so 

A{k) = {{z,j)eA:ze'JZ{k),jenk)}. 

Figure [l] shows an example of an active set in the graph (V4,£^4). Note how 
the edges in the active set join together vertices that share the same value, 
thus holding together regions of constant value. 

Since (V,^) is acyclic, the graph (7l{k),A{k)) is a connected, acyclic 
graph. This feature is crucial as it allows the region 7l{k) to be split into 
two subregions by removing just one edge (/, J) from A{k). We will denote 
these two subregions by 71{I, J) and 7^(J, /), where 

n{I, J) = {ie n{I) : i is connected to / in (V, A \ (/, J))} . 

We associate with the region or subregion Tl{a) (where a = k or a = I, J) 
the quantities 

fna= ^ I WiUi + ~ (^jAhi and Ma = ^i- 



Theorem 1 Suppose there exists a fit f and set of edges A such that fi = fj 
for all G A and (V, ^) is acyclic. Also suppose there are values Ctj such 
that 

Ci,j = sign(/j - fi) or fi = fj for all {i,j) E £, (2) 

c„ = ±l e A (3) 

Ukfk = rrik for all k eV, (4) 
and \ui^jfi - {mi J - cijXi^j)\ < Xi^j for all (/, J) G A. (5) 

Then f minimises Q{f)- 
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A proof is given in the Appendix. 

These conditions can be shown to be similar to the taut string of Davies 
and Kovac (2001). When the graph is {V2,£2) the condition ([s]) describes a 
tube and Q describes a string threaded through the tube and pulled taut 
(Mammen and van de Geer 1997). 

2.2 Algorithm 

The algorithm that we describe can be considered to search for the graph 
(V, A) and vector c described in Theorem [l} At any point during the algo- 
rithm the current value of c defines a working objective function 

Q{f;c) := ^Y.^i(fi-yif + H \ci,j\Kj\fj - fi\- 

When we have satisfied the constraints ^ then Q{f; c) = Q{f)- The current 
value of / always minimises c) so when ^ holds it also minimises Q{f)- 
For / to minimize Q{f; c) a slightly modified version of Theorem [l] tells us 
that we must have 

sign(cij) = sign(/j - /i) when fi ^ fj, (6) 

^ and Q must hold, and 

< - sign(c/,j)(M/,j// - mij) < 2|c/,j|A/,j for all (/, J) G A. (7) 

We start with c = 0. In this initial case Q{f]c) = |Z]jev^i(/i ~ 
so we start with f = y as this is the minimizer. Our algorithm gradually 
increases the penalty on each edge: at each iteration c^^i moves from to 
±1 for one particular edge (fc, I) G 8. Once (|2| is satisfied for an edge, then 
it remains satisfied. The algorithm stops when ^ is satisfied at all edges. 
This event will occur in a finite time, as stated by Theorem [2] below. 

Theorem 2 The algorithm described here will terminate in a finite time, 
and finds a minimizer of Q{f), for any graph, data, weights and smoothing 
parameters. 

The proof is contained in the Appendix. 
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We now give precise details about each iteration of our algorithm. At each 
iteration we start with / that minimises Q{ - ]c) and move to / = f + Af that 
minimises Q{-]c + Ac). We start each iteration with an edge {k,l) chosen 
such that fk 7^ fi and Ck^i ^ sign(/; — fk). We want to move in the direction 
that satisfies Ck^i = sign(/; — f^). The condition Q tells us we need Ac such 
that Acij = for ^ {k,l) and Ack,i = UkAfk/Xk,i- 

It is clear that as c changes, / must change to compensate. As Ck,i changes, 
the penalty on the edge {k, I) increases, so we must reduce \ fi — fk\ in order 
to move to the minimum of Q{ - ; c + Ac). 

This change must take place within the constraints of the active set. 
Therefore we must alter and // uniformly on the whole of the regions 
7l{k) and 7l{l). This means fi = fi + Afk for i G 7i{k) and fi = fi + Afi for 
i G 7l{l). In order to preserve Q we must have fi = fi for i G 7l{k) U 7l{l). 
So the regions 7l{k) and 71(1) will move closer together in value. 

As the regions move closer together there may need to be changes to the 
active set. To make sure that these changes happen we will increase the 
penalty on {k, I) in small steps. Specifically we will change / and c only by 
enough to trigger the first change in the active set. 

In this subsection we will discuss the possible changes to the active set 
as 7l{k) and TZ{1) move closer together. There are four possible events that 
could happen: no change, merging of TZ{k) and 7^(/), amalgamation with a 
neighbouring region, and splitting a region. 

For each of these events we give, below, the associated values of Afk, Afi 
and Ack,i- We also describe appropriate adjustments to the active set. In 
order to trigger the first change in the active set, the algorithm chooses the 
event for which \Afk\ and \Afi\ are both smallest. The Appendix contains 
proofs of these values. 

Once the no change or merging steps are complete, we can set Ck,i = 
sign(/; — fk) and the iteration is over. We choose another edge {k,l) for 
which fk 7^ fi and Ck,i 7^ sign(/; — fk) and iterate again. If there is no such 
edge then the algorithm stops, since Q{f; c) = Q{f). Once amalgamation or 
splitting has taken place, we proceed to further reduce \fi — fk\, now altering 
/ uniformly on a changed region. 

2.2.1 No change to active set 

There may be no disruption necessary to the active set before Ck,i + Ac^,; = 
sign(/; — fk) is satisfied. This means that we have Ukfk = and uifi = mi, 
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and still holds for all (/, J) G A{k) U A{1). 

This event can only occur if > and ui > 0. The associated changes 
in fk and fi are 

A r (sign(/z - /fc) - Ck,i)Xk,i . . , (sign(/fc - fi) + Ck,i)Xk,i 
Afk = and Afi = . 

Uk Ul 

2.2.2 Merging of the two regions 

Before we reach the target value of Ck,i = sign(/; — fk), the regions 7l{k) and 
TZ{1) might meet each other in value. This would mean that fk = fi and 
\fi ^ fk\ can be decreased no further. The changes in fk and fi are 

AA = -^{fi - fk) and Afi = -^{fk - fi). (8) 

Uk + Ui Uk + Ul 

U Uk = Ul = then we can choose Afk = {fi - fk)/2 and Afi = {fk - fi)/2. 

Since we now have fk = fi we merge the two regions 7l{k) and 71(1) by 
adding {k, I) to the active set. If there are other edges that join 7l{k) and 
TZ{1), then they will not be added to A, even though they share the same 
value of /. This will ensure that the graph (V, A) remains acyclic. 



2.2.3 Amalgamation of a neighbouring region 

Before we reach the minimizer of (5( ■ ; c + Ac), the value of / in the region 
TZ{k) may meet the value in a neighbouring region that is not TZ{1). More 
formally there may be a vertex i e TZ{k) and K ^ TZ{k) U 7^(/) for which 
Ci^K 7^ or CK,i ^ 0, and fk< fx < fi or fk> fK> fi- 

This event is only possible if ui > 0, or if ui = Uk = 0, or if = and 
fK = fk- The changes to / associated with this event are 

A/. ^ /. - A a.d Af, ^ { - >;^,^^ (9) 

We now have fi = fx and if we proceed to alter / we may break the 
constraint (|6| at the edge (i, K) or {K, i). Therefore, if sign(Q,ii-) = sign(A/fc) 
or sign(cx,j) = — sign(A/fc), we add this edge to the active set. This will 
amalgamate the region 'R-{K) into TZ{k). 

If there are other edges that join TZ{k) and 'R-{K) then they will not be 
added to A. This ensures that the graph (V, A) remains acyclic. Of course 
a similar amalgamation might occur with a neighbour of TZ{1). 
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2.2.4 Splitting a region 



Before arriving at the minimizer of Q( ■ ; c + Ac) we must test whether an 
edge (/, J) G A{k) U A{1) should be removed from the active set. This will 
split the region lZ{k) or 1Z(1) into two subregions. If the split takes place it 
may be necessary to swap the sign of c/^j, in order to preserve the constraint 
^ at {I, J). This will not affect Q{f;c). We use condition ([T]) to tell us 
when an edge should be removed, once we have accounted for the possible 
sign change. 

This event can only occur if > and ui > 0. The values of / and c at 
which (J, J) G A{k) should be removed are given by 

An -ui,jfk- ci^jXij±sign{fi- fk)\ciJXjj Wfc 
A/fc = and Afi = A/^, 

Ul,J Ul 

with + for k G J, /) and — for A; G 7^(/, J)- The corresponding values for 
(/, J) G A{1) are obtained by swapping k and /. 



3 Computational Complexity 

We now discuss the computational complexity of our algorithm in the setting 
of image analysis, in which the graph is {ViyS^). For the sake of simplicity 
we consider a square image, letting V4 be an x 77 grid of vertices. We 
are interested in the computational complexity in terms of the number of 
observations, or vertices, n. So n = i]"^ and the set £4 contains 2n — 2n^/^ 
edges. 

Suppose we were to use a generic active set method to minimize c) 
subject to ([2]). This would be very computationally expensive, mainly be- 
cause we may need to try all possible combinations of c in {—1, i}2n-2ni/2^ 
which leads to exponential complexity. Our algorithm does not need to try 
all combinations of c. In fact once Cfc^^ = sign(/; — /fc) is satisfied it will re- 
main satisfied until our algorithm stops. Therefore we only have to consider 
each edge once when satisfying ([2]). So we need only perform 0{n) iterations 
instead of 0(22"" 2"'^'). 

In addition, our algorithm does not need to check all possible active sets 
every time we add an edge. In the process of satisfying ([2]) for one edge we 
may need to change the active set many times, through repeated splitting 
or amalgamation. Since |// — decreases monotonically, once an edge has 



9 



been removed from A{k) or A{1) it cannot be included again during this 
iteration. Therefore, during one iteration, every edge may be added once, 
and removed once, from the active set. So our algorithm considers at most 
2(2n — 2n^f'^) + 1 active sets per iteration. 

Finally, for each of these active sets we will need to make some calcu- 
lations. It is possible to calculate m^, and u/^j, m/^j for all (/, J) G 
A{k) U A{1) without visiting a vertex in TZ{k) U 'R-{1) more than twice. The 
algorithm must check for possible neighbouring regions to amalgamate with. 
It must also check condition ([7| at every edge in A{k) and A{1). Since 
{lZ{k),A{k)) and {1Z{1),A{1)) are connected, acyclic graphs, there will only 
be \R.{k)\ — 1 and |7^(0I ~ 1 edges to check. Therefore the complexity of 
the calculation is OiXlZ{k) \ + |7^(/)|). This is at most 0(n), compared with 
O(n^) for methods based on matrix inversion, such as that of Goldfarb and 
Idnani (1983). 

We can reduce the computational complexity even further by working 
with small sub-images that gradually increase in size. We control the order in 
which the edge constraints (|2| are satisfied in order to keep \R.{k)\ and |7^(/)| 
as small as possible. Here we describe an implementation of our algorithm 
in which the maximum size of a region grows dyadically. For the sake of 
simplicity we will consider 77 to be an integer power of 2. It is easy to adapt 
this method for other values of 77, and for non-square images. 

The edge constraints are satisfied in stages, there being \og2'n stages in 
total. At stage p we consider those edges in the set 

{((z, 2^q - 2^-1), (7, 2^q - 2^-' + 1)) G £4 : g = 1, • • • , r//2^} 

followed by those in the set 

{{{2Fq - 2^-\t), {2Fq - 2^-' + E 8, : q = I, . . . ,r//2^} . 

The effect is that as the edges are considered the graph of satisfied edges 
grows dyadically. At the first stage the graph looks like pairs of vertices, 
followed by squares of 2 x 2 vertices. At the second stage the graph looks like 
connected rectangles of 2 x 4 vertices, followed by squares of 4 x 4 vertices. 
The process continues until all edges are satisfied and the whole square of 
1] X 7] vertices are connected. 

The advantage of this implementation is our algorithm will never allow 
an edge (fc, /) in the active set if Ck^i = 0. Therefore 7l{k) and 7^(/) can never 
be larger than the rectangle connected by satisfied edges that contains k and 
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/. At stage p this rectangle will contain at most 2^^ vertices. Furthermore in 
the process of satisfying Ck,i = siga{fi — fk), the active set will only change on 
edges inside this connected rectangle. So there are at most 2(2^^"''^ — 2^"^^) + 1 
active sets to consider. 

It is possible to find the total computational complexity of this imple- 
mentation. At every stage we must satisfy constraints on 0{rf2^'P) edges. 
For each of these edges we may have to check 0(2^^) active sets and for each 
active set perform 0(2^'') calculations. Therefore the overall complexity is 



4 Examples 

4.1 Achieving a constant baseline 

The data shown in Figure |2] are an excerpt from the spectroscopic analysis of 
a gallstone. Looking at the data, it seems reasonable to think of the points 
as having been generated by a function that is a fiat baseline with occasional 
spikes. Furthermore we have information about the correct location and 
number of spikes (Davies and Kovac 2001). 

The left-hand plot in Figure [2] shows an estimate obtained by minimising 
([T]). The smoothing parameters Ai, . . . , A„_i were chosen by local squeezing, 
which aims to arrive at the smoothest function that satisfies the multires- 
olution condition The smoothing parameters are only reduced in intervals 
where the multiresolution condition is not satisfied. The estimates also show 
a mean correction: after running our algorithm we reset to the mean of 
the observations in TZ{i), for all i. See Davies and Kovac (2001) for more 
details. 

The estimate in Figure [2] identifies all the spikes. However the left-hand 
estimate has not identified the constant baseline well. Outside of the spikes, 
at the fiat parts of the estimate, the fitted function takes many different 
values. 

We propose a different graph that enables the algorithm to find a better 
estimate of the constant baseline. We introduce a new vertex, indexed n + 1. 
This corresponds to a dummy observation with value yn+i = 0. We set 
the weight Wn+i = 0, so that the value of yn+i cannot influence the fitted 
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Figure 2: Scatterplots of data from spectroscopy. The solid lines show the 
function fitted by means of a total variation penalty (left) and the improved 
estimate of the baseline (right). 



function /i, . . . ,/«• This new vertex is connected to the rest of the graph 
with n new edges. One new edge connects each existing observation to the 
dummy observation. 

The idea is that the baseline regions (those observations or vertices that 
are not at a spike) will be joined together via the dummy vertex. All of 
the baseline regions can be joined into one region. The result is a constant 
baseline everywhere that there is not a bump. The estimate of the baseline 
value will also improve, since the region contains more observations. 

It is assumed that there are more observations in the baseline region than 
at a spike, so the dummy vertex will join the baseline region and not another 
region. 

It remains to fix the values Ai^^+i, . . . , A„^„+i. With no prior knowledge 
about the location of the spikes, we set Ai^„+i = ■ ■ ■ = A„^„+i = A;,. By using 
equal smoothing parameters we will not encourage any particular vertex to 
join the baseline region. The other smoothing parameters, Ai, . . . , A^-i are 
still chosen by local squeezing. We suggest setting A;, = min(Ai, . . . , A„_i) so 
that no vertex will be influenced by the baseline more than its neighbours. 

It is easy to see, in the right-hand plot of Figure |2| the improvement that 
this graph causes at the baseline. 
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4.2 Image analysis 

Figure |3] shows, on the left, a noisy image that was used as an example by 
Polzehl and Spokoiny (2000). This example demonstrates the use of our 
algorithm in the case where the graph is (V4,£^4), which is suggested by the 
4-neighbourhood. 

This particular image exhibits areas of solid colour, with sharp discon- 
tinuities between them. We would expect to see this in many images. Our 
algorithm works well on this kind of image, because the areas of solid colour 
can be represented by regions of constant value. 

There are many proposed methods for choosing the smoothing parame- 
ters. As, at this point, we are only interested in demonstrating our algorithm, 
we have employed a simple method suggested by Rudin et al. (1992). It uses 
a global smoothing parameter. A, and is based around an estimate of the 
global variance, a^. Of course our algorithm allows different smoothing pa- 
rameters at every edge, so we can make use of more complicated methods if 
we wish. 

In order to find the simplest image for which the residuals behave as ex- 
pected, we increase A until Z]j6V4(/j = o"^|V4|. According to Chambolle 
(2004) this value of A will always exist. 

Of course we require an estimate of that is independent of the residuals. 
We can use, for example, one similar to that proposed by Davies and Kovac 
(2001): 

1-48 , , / X ^ X 

a = median (lyj - yi\ : (z,j) E £.4) . 

The output of our algorithm, the image estimated by use of the graph 
(V4,£^4), is shown in the right-hand image of Figure |3} 

4.3 Irregularly-spaced data 

We generated 1000 covariates uniformly on [0,1] x [0,1]. At each of these 
points we calculated a value from the function 

/(xi,X2) = exp(-100((xi- 0.5)2 + (X2- 0.5)2)) 

- exp(-1000((xi - 0.25)2 + (xs - 0.25)^)) 

- exp(-1000((a;i - 0.75)^ + (xs - 0.75)^)). (10) 

This function describes a surface with a broad bump at (0.5,0.5) and two 
sharper, inverted bumps at (0.25,0.25) and (0.75,0.75). To each of these 
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Figure 3: Noisy (left) and denoised (right) versions of the image of Polzehl 
and Spokoiny (2000). 



values we added Gaussian noise with zero mean and standard deviation 0.05 
to make 1000 noisy response observations. The noisy surface is shown in 
Figure |4} 

In order to calculate an estimate for / the Delaunay triangulation was 
used to connect the irregularly spaced covariates by a graph, see Figure |4j 

For the sake of comparison, Figure |4] also shows a kernel estimate ap- 
plied to the data. We chose the global bandwidth that minimises the true 



squared error between the kernel estimate and the function given by (10). 
So this can be thought of as the 'best' global-bandwidth kernel estimate. Al- 
though it identifies the three bumps, it also exhibits many additional bumps 
in locations where the signal function is practically flat. 

The bottom right plot in Figure |4] shows the output of our algorithm, the 
result of minimising Q{f) on the graph given by the Delaunay triangulation. 
We chose a global smoothing parameter by the same method as the image 
analysis example. This estimate identifies the three signal bumps but does 
not suffer from the introduction of extra bumps. There is a large region of 
constant value where the signal function is flat, so the estimate is also fiat in 
these locations. 
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Figure 4: Example of irregularly-spaced data. The noisy simulated data is 
shown (top left) together with a kernel estimate (bottom left). Note the 
presence of many additional bumps in the kernel estimate. The Delaunay 
triangulation (top right) shows the location of vertices, and the edges of 
the graph that we obtain. The final plot (bottom right) shows the estimate 
obtained by minimising Q{f) over the vertices of this graph. 
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A Appendix: Proofs 



A.l Proof of Theorem [I] 

We will show that Q, ^ and J7|) are sufficient for / to minimize Q{f;c). 
Theorem [l] easily follows when ([2| also holds. 

The problem of minimising Q{f]c) can be posed as a constrained opti- 
mization problem with objective function 

minimized subject to CijXijVij < CijXij{fj — fi) and CijXijVij < for all 

The Karush-Kuhn-Tucker conditions (see for example Bazaraa, Sherali 
and Shetty 1993, chap. 4) give a sufficient condition for / and f to be a 
solution. We require the existence of Lagrange multipliers fiij > and 
/i-j > such that fiij = if Ci^Vij < Ci^jifj - fi) and //■ ,,• = if Vij ^ 0, 
where 2cijXij = CijXijfiij + CijXijfi'i j and 

Wi{fi-yi)- CijAij + Y 

When (g holds c,,,(/,- - /.) > > CijVij and hence /ijj- — if fj 7^ /j. 
Otherwise the non-negativity requirements on /ij .,■ and /i^ j imply < /ijj < 2. 

Now suppose there exists an active set A such that (V, A) is acyclic, 
and ^ and ([T]) hold. The system of equations in (A.l) is equivalent to 



the system of equations obtained by summing (A.l) over all regions and 
subregions defined by A. This system is: for every a = A; G V or a = (J, J) e 
A, 

Uafi-ma = - Y Y CijAij/Xij - Y Cj,iAj-j/iij , / G 7^(a), 
^e7^(a) \i:(i,j)e£ r-(j,i)<^£ J 

When ^ and ^ hold appropriate Lagrange multipliers exist for the above 
system of equations to be sufficient for / to minimize (5(/; c). Namely /x/^j = 
— {uj^jfi — mi^j) / ci^jXi^j if (J, J) E A and = otherwise. 
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A. 2 Alterations of the active set 



In this subsection we prove the different values of A/^, Afi and Ack,i asso- 
ciated with the events described in Subsection 12.21 

The condition Q tells us that Ukfk = rnk and uji = rrii, and also Uk{fk + 
A/fe) = mk + Ack,i\k,i and ui{fi + Afi) = mi - Ack,iXk,i- Combining these 
equations we see that we must have 

UkAfk = Ack,i\k,i, (A.2) 
uiAfi = -Ack,i\k,h (A.3) 



A.2.1 No change to active set 

If there are no necessary changes to the active set, then Ck,i will reach the 
target value of sign(/z - fk). There fore A ck,i = sig n(/; - fk) - Ck,i 7^ 0. The 
values of Afk and Afi follow from (A.2) and (A. 3) respectively, as does the 
requirement that Uk > and ui > 0. 



A. 2. 2 Merging of the two regions 



The two regions 7l{k) and 71(1) will merge when fk+Afk = fi+Afi. Provided 
that Uk > and ui > 0, combining the above equation with (A.2) and (A. 3) 
gives U Uk = ui = then we can set fk = fi equal to any value that we 
choose, such as the mean and median value {fk + fi)/2. 



A. 2. 3 Amalgamation of a neighbouring region 

Given a suitable vertex K, the two regions TZ{k) and TZ{K) will amalgamate 
when fk + Afk = fx- When > the values in ([9]) follow immediately from 
(A.2) and (A. 3). When = equating (A.2) and (A. 3) shows that either 
Mfc = or Afk = 0. In either case it makes little sense to alter fi, so we let 
^fi = 0. 



A. 2. 4 Splitting a region 

Suppose we split 7l{k) by removing (/ 
this happens satisfies ([T]) in equality. 
k G 71(1, J). We will need to swap the 



, J) from A. The value of fk at which 
Without loss of generality suppose 
sign of c/,j if sign(c/,j) = sign(A/fc) = 
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sign(/; — /fc). Once this is taken into account m/ j becomes m/ j — cjjXjj — 
sign(/; - fk)\ci,j\XLj and becomes 

< \Afk\ + sign(/; - fk){ui,jfk - mi^j + ci^jXij) + |c7,j|A/,j < 2|c/,j|A/,j. 



The value for Afk follows when the upper limit is satisfied in equality. If 
Uk = then uij = so ([T]) will never change when fk changes. If = 
then Afk = from equating ( |A.2 ) and (A. 3). Clearly for fk to change and 
a split to occur we must have Uk > and ui > 0. The value for A// follows 
from equating (A. 2) and (A. 3). 



A. 3 Proof of Theorem d 

We will show that the objective function at the end of each iteration, Q{f + 
Af;c + Ac), is never less than the objective function at the start of the 
iteration, Q{f] c). Since / minimises c) and Acij = except for = 
{k, I), we have 

Q(/ + A/;c + Ac)-Q(/;c) 
= Qif + A/; c) - Q(/; c) + \Ack,i\Xk,i\fi + Afi - fk - Afk\ 
> \Ack,i\Xk,i\fi + Afi-fk-Afk\>0. 

Equality can only occur when Ack^i = or fi + Afi = fk — Afk- So the 
only time that Q{f; c) does not increase is during merging or amalgamation. 
Therefore an edge cannot be removed from the active set without an increase 
in Q{f]c). This means that the algorithm never visits the same value of c 
and A twice, and will always arrive at the situation described in ^ and 
terminate. 
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